127 research outputs found

    Mining Electronic Health Records to Validate Knowledge in Pharmacogenomics

    Get PDF
    International audienceThe state of the art in pharmacogenomics (PGx) is based on a bank of knowledge resulting from sporadic observations, and so is not considered to be statistically valid. The PractiKPharma project is mining data from electronic health record repositories, and composing novel cohorts of patients for confirming (or moderating) pharmacogenomics knowledge on the basis of observations made in clinical practice

    Prédiction de défauts dans les arbres du parc végétal Grenoblois et préconisations pour les futures plantations

    Get PDF
    National audienceNous décrivons dans cet article notre réponse au défi EGC 2017. Une analyse exploratoire des données a tout d’abord permis de comprendre les distributions des différentes variables et de détecter de fortes corrélations. Nous avons défini deux variables supplémentaires à partir des variables du jeu de données. Plusieurs algorithmes de classification supervisée ont été expérimentés pour répondre à la tâche numéro 1 du défi. Les performances ont été évaluées par validation croisée. Cela nous a permis de sélectionner les meilleurs classifieurs uni-label et multi-label. Autant sur la tâche uni-label que multi-label, le meilleur classifieur dépasse les références d’environ 2%. Nous avons également exploré la tâche numéro 2 du défi. D’une part, des règles d’association ont été recherchées. D’autre part, le jeu de données a été enrichi avec des connaissances telles que des données climatiques (pluviométrie, température, vent) ou des données taxonomiques dans le domaine de la botanique (famille, ordre, super-ordre). En outre, des données géographiques et cartographiques sont exploitées dans un outil de visualisation d’une partie des données sur les arbres

    Unsupervised Extra Trees: a stochastic approach to compute similarities in heterogeneous data.

    Get PDF
    International audienceIn this paper we present a method to compute similarities on unlabeled data, based on extremely randomized trees. The main idea of our method, Unsu-pervised Extremely Randomized Trees (UET) is to randomly split the data in an iterative fashion until a stopping criterion is met, and to compute a similarity based on the co-occurrence of samples in the leaves of each generated tree. Using a tree-based approach to compute similarities is interesting, as the inherent We evaluate our method on synthetic and real-world datasets by comparing the mean similarities between samples with the same label and the mean similarities between samples with different labels. These metrics are similar to intracluster and intercluster similarities, and are used to assess the computed similarities instead of a clustering algorithm's results. Our empirical study shows that the method effectively gives distinct similarity values between samples belonging to different clusters, and gives indiscernible values when there is no cluster structure. We also assess some interesting properties such as in-variance under monotone transformations of variables and robustness to correlated variables and noise. Finally , we performed hierarchical agglomerative clustering on synthetic and real-world homogeneous and heterogeneous datasets using UET versus standard similarity measures. Our experiments show that the algorithm outperforms existing methods in some cases, and can reduce the amount of preprocessing needed with many real-world datasets

    Clustering graphs using random trees

    Get PDF
    In this work-in-progress paper, we present GraphTrees, a novel method that relies on random decision trees to compute pairwise distances between vertices in a graph. We show that our approach is competitive with the state of the art methods in the case of non-attributed graphs in terms of quality of clustering. By extending the use of an already ubiquitous approach-the random trees-to graphs, our proposed approach opens new research directions, by leveraging decades of research on this topic

    An Experimental Evaluation of Similarity-Based and Embedding-Based Link Prediction Methods on Graphs

    Get PDF
    International audienceThe task of inferring missing links or predicting future ones in a graph based on its current structure is referred to as link prediction. Link prediction methods that are based on pairwise node similarity are well-established approaches in the literature and show good prediction performance in many realworld graphs though they are heuristic. On the other hand, graph embedding approaches learn lowdimensional representation of nodes in graph and are capable of capturing inherent graph features, and thus support the subsequent link prediction task in graph. This paper studies a selection of methods from both categories on several benchmark (homogeneous) graphs with different properties from various domains. Beyond the intra and inter category comparison of the performances of the methods, our aim is also to uncover interesting connections between Graph Neural Network(GNN)based methods and heuristic ones as a means to alleviate the black-box well-known limitation

    Extraction de données pharmacogénomiques à partir d'études cliniques : problématique

    Get PDF
    L'importance des variations individuelles dans les réactions aux médicaments devient un problème conséquent à la fois au niveau de la recherche pharmaceutique et au niveau médical. Notre projet de recherche vise à intégrer des données cliniques et génétiques issues d'études cliniques avec comme objectif d'en extraire une connaissance sur les relations existantes entre un génotype particulier et son action sur l'effet d'un médicament. Pour répondre à ce problème, nous cherchons des méthodes de fouille adaptées aux données biomédicales que nous souhaitons manipuler et capables d'intégrer les connaissances du domaine sous forme d'ontologie. Ce projet est l'objet d'une thèse qui a commencé en novembre 2004

    Kbdock - Searching and organising the structural space of protein-protein interactions

    Get PDF
    International audienceBig data is a recurring problem in structural bioinformatics where even a single experimentally determined protein structure can contain several different interacting protein domains and often involves many tens of thousands of 3D atomic coordinates. If we consider all protein structures that have ever been solved, the immense structural space of protein-protein interactions needs to be organised systematically in order to make sense of the many functional and evolutionary relationships that exist between different protein families and their interactions. This article describes some new developments in Kbdock, a knowledge-based approach for classifying and annotating protein interactions at the protein domain level

    NRPS toolbox for the discovery of new nonribosomal peptides and synthetases

    Get PDF
    National audienceNonribosomal peptide synthetases are huge multi-enzymatic complexes synthesizing peptides, but not through the classical process of transcription and then translation. The synthetases are organised in modules, each one integrating an amino acid in the final peptide. The modules are divided in domains providing specialized activities. So, those enzymes are as diverse as their products. We present our toolbox designed to annotate them accurately and promising results obtained on some Burkholderia, Bacillus and Pseudomonas genomes

    Formal Concept Analysis Applied to Transcriptomic Data

    Get PDF
    International audienceIdentifying functions or pathways shared by genes responsible for cancer is still a challenging task. This paper describes the preparation work for applying Formal Concept Analysis (FCA) to biological data. After gene transcription experiments, we integrate various annotations of selected genes in a database along with relevant domain knowledge. The database subsequently allows to build formal contexts in a flexible way. We present here a preliminary experiment using these data on a core context with the addition of domain knowledge by context apposition. The resulting concept lattices are pruned and we discuss some interesting concepts. Our study shows how data integration and FCA can help the domain expert in the exploration of complex data

    BR-Explorer: A sound and complete FCA-based retrieval algorithm (Poster)

    Get PDF
    In this paper we present BR-Explorer, a sound and complete biological data sources retrieval algorithm based on Formal Concept Analysis and domain ontologies. BR-Explorer addresses the problem of retrieving the relevant data sources for a given query. Initially, a formal context representing the relation between biological data sources and their metadata is provided and its corresponding concept lattice is built. Then BR-Explorer starts by generating the formal concept for the considered query and inserting it into the provided concept lattice. The next step of BR-Explorer is to locate the "pivot" concept in the resulting concept lattice. Based on this pivot concept, BR-Explorer builds the result step by step by considering the pivot superconcepts in the resulting concept lattice until the top concept is reached. Finally BR-Explorer provides the set of relevant data sources ranked according to their relevance with respect to the considered query. An ontology-based query refinement procedure is integrated in BR-Explorer. This procedure takes advantage of semantic information about the data source metadata and the queries to improve the BR-Explorer results
    • …
    corecore